Picture for Yao Shu

Yao Shu

Training Multi-Turn Search Agent via Contrastive Dynamic Branch Sampling

Add code
Feb 03, 2026
Viaarxiv icon

1S-DAug: One-Shot Data Augmentation for Robust Few-Shot Generalization

Add code
Jan 27, 2026
Viaarxiv icon

Controllable Concept Bottleneck Models

Add code
Jan 01, 2026
Viaarxiv icon

Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives

Add code
Sep 26, 2025
Figure 1 for Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives
Figure 2 for Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives
Figure 3 for Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives
Figure 4 for Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives
Viaarxiv icon

RIMO: An Easy-to-Evaluate, Hard-to-Solve Olympiad Benchmark for Advanced Mathematical Reasoning

Add code
Sep 09, 2025
Figure 1 for RIMO: An Easy-to-Evaluate, Hard-to-Solve Olympiad Benchmark for Advanced Mathematical Reasoning
Figure 2 for RIMO: An Easy-to-Evaluate, Hard-to-Solve Olympiad Benchmark for Advanced Mathematical Reasoning
Figure 3 for RIMO: An Easy-to-Evaluate, Hard-to-Solve Olympiad Benchmark for Advanced Mathematical Reasoning
Figure 4 for RIMO: An Easy-to-Evaluate, Hard-to-Solve Olympiad Benchmark for Advanced Mathematical Reasoning
Viaarxiv icon

Zeroth-Order Optimization is Secretly Single-Step Policy Optimization

Add code
Jun 17, 2025
Figure 1 for Zeroth-Order Optimization is Secretly Single-Step Policy Optimization
Figure 2 for Zeroth-Order Optimization is Secretly Single-Step Policy Optimization
Figure 3 for Zeroth-Order Optimization is Secretly Single-Step Policy Optimization
Figure 4 for Zeroth-Order Optimization is Secretly Single-Step Policy Optimization
Viaarxiv icon

On Path to Multimodal Historical Reasoning: HistBench and HistAgent

Add code
May 26, 2025
Viaarxiv icon

PAFT: Prompt-Agnostic Fine-Tuning

Add code
Feb 18, 2025
Viaarxiv icon

Refining Adaptive Zeroth-Order Optimization at Ease

Add code
Feb 03, 2025
Viaarxiv icon

Meta-Prompt Optimization for LLM-Based Sequential Decision Making

Add code
Feb 02, 2025
Figure 1 for Meta-Prompt Optimization for LLM-Based Sequential Decision Making
Figure 2 for Meta-Prompt Optimization for LLM-Based Sequential Decision Making
Figure 3 for Meta-Prompt Optimization for LLM-Based Sequential Decision Making
Figure 4 for Meta-Prompt Optimization for LLM-Based Sequential Decision Making
Viaarxiv icon